Clock power minimization with regular physical placement of clock repeater components

ABSTRACT

Power, routability and electromigration have become crucial issues in modern microprocessor designs. In high performance designs, clocks are the highest consumer of power. Arranging clocking components with regularity so as to minimize the capacitance on the clock nets can help reduce clock power, however, it may hurt performance due to some loss of flexibility in physically placing those components. The present invention provides techniques to optimally place clock components in a regular fashion so as to minimize clock power within a performance constraint. A rectangular grid is created and clock distribution structures are assigned to the grid intersection points. Latches are then located around the clock distribution structures to minimize an overall distance for connections between the latches and respective clock distribution structures. The horizontal and vertical pitches of the grid may be independently adjusted to achieve a more uniform spread of the clock distribution structures.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with Government support under DARPA,HR0011-07-9-0002. THE GOVERNMENT HAS CERTAIN RIGHTS IN THIS INVENTION.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention generally relates to the fabrication and design ofsemiconductor chips and integrated circuits, and more particularly to amethod of designing the physical layout (placement) of latches and otherlogic cells which receive clock signals from clock distributionstructures such as local clock buffers.

2. Description of the Related Art

Integrated circuits are used for a wide variety of electronicapplications, from simple devices such as wristwatches to the mostcomplex computer systems. A microelectronic integrated circuit (IC) chipcan generally be thought of as a collection of logic cells withelectrical interconnections between the cells, formed on a semiconductorsubstrate (e.g., silicon). An IC may include a very large number ofcells and require complicated connections between the cells. A cell is agroup of one or more circuit elements such as transistors, capacitors,resistors, inductors, and other basic circuit elements grouped toperform a logic function. Cell types include, for example, core cells,scan cells and input/output (I/O) cells. Each of the cells of an IC mayhave one or more pins, each of which in turn may be connected to one ormore other pins of the IC by wires. The wires connecting the pins of theIC are also formed on the surface of the chip. For more complex designs,there are typically at least four distinct layers of conducting mediaavailable for routing, such as a polysilicon layer and three metallayers (metal-1, metal-2, and metal-3). The polysilicon layer, metal-1,metal-2, and metal-3 are all used for vertical and/or horizontalrouting.

An IC chip is fabricated by first conceiving the logical circuitdescription, and then converting that logical description into aphysical description, or geometric layout. This process is usuallycarried out using a “netlist,” which is a record of all of the nets, orinterconnections, between the cell pins. A layout typically consists ofa set of planar geometric shapes in several layers. The layout is thenchecked to ensure that it meets all of the design requirements,particularly timing requirements. The result is a set of design filesknown as an intermediate form that describes the layout. The designfiles are then converted into pattern generator files that are used toproduce patterns called masks by an optical or electron beam patterngenerator. During fabrication, these masks are used to pattern a siliconwafer using a sequence of photolithographic steps. The process ofconverting the specifications of an electrical circuit into a layout iscalled the physical design.

Cell placement in semiconductor fabrication involves a determination ofwhere particular cells should optimally (or near-optimally) be locatedon the surface of an integrated circuit device. Due to the large numberof components and the details required by the fabrication process forvery large scale integrated (VLSI) devices, physical design is notpractical without the aid of computers. As a result, most phases ofphysical design extensively use computer-aided design (CAD) tools, andmany phases have already been partially or fully automated. Automationof the physical design process has increased the level of integration,reduced turn around time and enhanced chip performance. Severaldifferent programming languages have been created for electronic designautomation (EDA) including Verilog, VHDL and TDML. A typical EDA systemreceives one or more high level behavioral descriptions of an IC device,and translates this high level design language description into netlistsof various levels of abstraction.

While current placement techniques provide adequate placement of cellswith regard to their data interconnections, there is an additionalchallenge for the designer in constructing a clock network for the cellsand this challenge is becoming more difficult with the latesttechnologies like low-power, 65-nanometer integrated circuits. Low powercircuits (e.g., around 20 watts or less for microprocessor chips) arebecoming more prevalent due to power consumption problems. Inparticular, power dissipation has become a limiting factor for the yieldof high-performance circuit designs (operating at frequencies around 1gigahertz or more) with deep submicron technology. Clock nets cancontribute up to 50% of the total active power in multi-GHz designs. Lowpower designs are also preferable since they exhibit less power supplynoise and provide better tolerance with regard to manufacturingvariations.

There are several techniques for minimizing power while still achievingtiming objectives for high performance, low power systems. One methodinvolves the use of local clock buffers (LCBs) to distribute the clocksignals. A typical clock control system has a clock generation circuit(e.g., a phase-lock loop) that generates a master clock signal which isfed to a clock distribution network that renders synchronized globalclock signals at the LCBs. Each LCB adjusts the global clock duty cycleand edges to meet the requirements of respective circuit elements, e.g.,local logic circuits or latches. Since this clock network is one of thelargest power consumers among all of the interconnects, it is furtherbeneficial to control the capacitive load of the LCBs, each of which isdriving a set of many clock sinks. One approach for reducing thecapacitive load is latch clustering, i.e., clusters of latches placednear the respective LCB of their clock domain. Latch clustering combinedwith LCBs can significantly reduce the total clock wire capacitancewhich in turn reduces overall clock power consumption. Since most of thelatches are placed close to an LCB, clock skew is also reduced whichhelps improve the timing of the circuit.

Conventional placement with LCBs and latch clustering is illustrated inthe flow chart of FIG. 1. The process begins with an initial placementbased on a layout for the circuit (1). The layout can be provided by anEDA tool, or can simply be a random layout for the circuit elements. Theinitial placement locates all circuit elements, including clock sinks,in a region of the integrated circuit using for example quadraticplacement. Other placement techniques may be used but quadraticplacement often produces better results than alternatives such asmin-cut based placement. The quadratic placement portion of the processsolves the linear system Ax=b where A is an optimization matrix, and xand b are vectors. During quadratic placement, cells are recursivelypartitioned into smaller bins until a target number of objects per binis reached, such as five objects per bin. For the initial placement, allwires (edges) have the same net-weighting. The timing of the circuit isthen analyzed and adjusted in early optimization (2). This optimizationmay include gate re-sizing and buffer insertion using a grid system suchas a 50×50 grid in which buffers are assigned to grid cells having lowerlogic densities. A weighted placement (3) follows which is similar tostep 1, but in the weighted placement the beginning layout is the outputof the early optimization step 2 and different weights are applied todifferent edges based on the timing constraints. The partitioning mayalso be finer for the weighted placement, e.g., recursively partitioninguntil there are around two objects per bin. The weighted placement isthen followed by late optimization which provides different logicoptimizations such as buffer insertion but at a finer (or sometimes thesame) level, e.g., in a 100×100 grid (4). Late optimization may be thesame as early optimization, the conceptual difference being that earlyoptimization works on a circuit which is never processed bylayout-driven optimization steps.

Steps 1 and 3 of FIG. 1 do not differentiate between latches and other(non-clocked) logic cells, so at first the latches are allowed to movefreely according to placement tools driven by data path timing. In thefollowing steps the process focuses on the latches only, i.e., latchesthat are part of one or more clock domains. Latches are grouped into agiven cluster based on locality and clock domain, and the LCB for agiven clock domain is located at the centroid of the latch clusters andthe latches are pulled to the LCB (5). For this latch-LCB drivenplacement, the size of the LCBs is temporarily shrunk to the same widthas a latch. A relatively high weighting (attraction) is applied to theinterconnections between the latch and the LCB for this placement step,e.g., by a factor of 10 compared to the net weights of the most criticaldata paths. In this manner all latches will be placed next to thecorresponding LCB, which is then readjusted back to its original size.Further placement and optimization may be performed with the locationsof the LCBs and latches fixed (6). The final step is detailed placementand optimization which refines the layout using for example min-cutplacement or heuristic techniques (7).

The resulting LCB-latch structure can be very large relative to othercircuit elements involved in the placement process, and greatly impactsthe timing of the circuit. As the number of such LCB-latch clustersgrows in more complex circuits, various problems arise includingtemperature distribution, clock latency, and inefficient use of LCBswith respect to drive capacity. It would, therefore, be desirable todevise an improved placement method which could optimally place clockcomponents so as to reduce or minimize clock power within a performanceconstraint. It would be further advantageous if the method could retainflexibility in physical placement of the clock components to reduceclock skew.

SUMMARY OF THE INVENTION

It is therefore one object of the present invention to provide aplacement method which improves clock power while reducing clock skew.

It is another object of the present invention to provide such a methodwhich results in better temperature distribution within an integratedcircuit.

It is yet another object of the present invention to provide such amethod which reduces clock latency from a global clock grid to localclock buffers to improve clock cycle time.

The foregoing objects are achieved in a method for designing the layoutof a plurality of latches in one or more clock domains of an integratedcircuit by receiving an input layout of the latches, creating arectangular grid in a region of the integrated circuit containing thelatches, the grid having grid intersection points defined by ahorizontal pitch and a vertical pitch, assigning clock distributionstructures to the grid intersection points, and locating the latchesaround the clock distribution structures to minimize an overall distancefor connections between the latches and respective clock distributionstructures. The assignment of clock distribution structures may includebipartite matching of the clock distribution structures to the gridintersection points for all corresponding clock domains, while the latchlocation may also include bipartite matching of latches to correspondingclock distribution structures. The horizontal and vertical pitches maybe independently adjusted to achieve a more uniform spread of the clockdistribution structures. The grid preferably has a perimeter defined bya bounding box of the circuit layout. The grid may have the same ordifferent horizontal pitch and vertical pitch. If the pitches are thesame, they may be calculated based on rectangular dimensions of the gridand a predetermined number of LCBs to be utilized.

The above as well as additional objectives, features, and advantages ofthe present invention will become apparent in the following detailedwritten description.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention may be better understood, and its numerousobjects, features, and advantages made apparent to those skilled in theart by referencing the accompanying drawings.

FIG. 1 is a chart illustrating the logical flow for a conventional latchplacement technique with clock optimization followed by latch-LCB drivenplacement;

FIG. 2 is a block diagram of a computer system programmed to carry outcomputer-aided design of an integrated circuit in accordance with oneimplementation of the present invention;

FIG. 3 is a chart illustrating the logical flow for regular physicalplacement of clock repeater components according to one implementationof the present invention;

FIG. 4 is a plan view of a region of an integrated circuit which isdivided into a grid in accordance with the present invention wherein theclock repeater components are located at intersection points of thegrid;

FIG. 5 is a graphical representation of a bipartite solution which maybe uses for LCB-to-clock assignment and latch-to-LCB assignment; and

FIGS. 6A-6E are plan views for layouts of an integrated circuit whichstart with an input layout of latches in one or more clock domains, andsubsequent placement of local clock buffers in accordance with oneimplementation of the present invention.

The use of the same reference symbols in different drawings indicatessimilar or identical items.

DESCRIPTION OF THE PREFERRED EMBODIMENT(S)

With reference now to the figures, and in particular with reference toFIG. 2, there is depicted one embodiment 10 of a computer system inwhich the present invention may be implemented. Computer system 10 is asymmetric multiprocessor (SMP) system having a plurality of processors12 a, 12 b connected to a system bus 14. System bus 14 is furtherconnected to a combined memory controller/host bridge (MC/HB) 16 whichprovides an interface to system memory 18. System memory 18 may be alocal memory device or alternatively may include a plurality ofdistributed memory devices, preferably dynamic random-access memory(DRAM). There may be additional structures in the memory hierarchy whichare not depicted, such as on-board (L1) and second-level (L2) orthird-level (L3) caches.

MC/HB 16 also has an interface to peripheral component interconnect(PCI) Express links 20 a, 20 b, 20 c. Each PCI Express (PCIe) link 20 a,20 b is connected to a respective PCIe adaptor 22 a, 22 b, and each PCIeadaptor 22 a, 22 b is connected to a respective input/output (I/O)device 24 a, 24 b. MC/HB 16 may additionally have an interface to an I/Obus 26 which is connected to a switch (I/O fabric) 28. Switch 28provides a fan-out for the I/O bus to a plurality of PCI links 20 d, 20e, 20 f. These PCI links are connected to more PCIe adaptors 22 c, 22 d,22 e which in turn support more I/O devices 24 c, 24 d, 24 e. The I/Odevices may include, without limitation, a keyboard, a graphicalpointing device (mouse), a microphone, a display device, speakers, apermanent storage device (hard disk drive) or an array of such storagedevices, an optical disk drive, and a network card. Each PCIe adaptorprovides an interface between the PCI link and the respective I/Odevice. MC/HB 16 provides a low latency path through which processors 12a, 12 b may access PCI devices mapped anywhere within bus memory or I/Oaddress spaces. MC/HB 16 further provides a high bandwidth path to allowthe PCI devices to access memory 18. Switch 28 may provide peer-to-peercommunications between different endpoints and this data traffic doesnot need to be forwarded to MC/HB 16 if it does not involvecache-coherent memory transfers. Switch 28 is shown as a separatelogical component but it could be integrated into MC/HB 16.

In this embodiment, PCI link 20 c connects MC/HB 16 to a serviceprocessor interface 30 to allow communications between I/O device 24 aand a service processor 32. Service processor 32 is connected toprocessors 12 a, 12 b via a JTAG interface 34, and uses an attentionline 36 which interrupts the operation of processors 12 a, 12 b. Serviceprocessor 32 may have its own local memory 38, and is connected toread-only memory (ROM) 40 which stores various program instructions forsystem startup. Service processor 32 may also have access to a hardwareoperator panel 42 to provide system status and diagnostic information.

In alternative embodiments computer system 10 may include modificationsof these hardware components or their interconnections, or additionalcomponents, so the depicted example should not be construed as implyingany architectural limitations with respect to the present invention.

When computer system 10 is initially powered up, service processor 32uses JTAG interface 34 to interrogate the system (host) processors 12 a,12 b and MC/HB 16. After completing the interrogation, service processor32 acquires an inventory and topology for computer system 10. Serviceprocessor 32 then executes various tests such as built-in-self-tests(BISTs), basic assurance tests (BATs), and memory tests on thecomponents of computer system 10. Any error information for failuresdetected during the testing is reported by service processor 32 tooperator panel 42. If a valid configuration of system resources is stillpossible after taking out any components found to be faulty during thetesting then computer system 10 is allowed to proceed. Executable codeis loaded into memory 18 and service processor 32 releases hostprocessors 12 a, 12 b for execution of the program code, e.g., anoperating system (OS) which is used to launch applications and inparticular the circuit design application of the present invention,results of which may be stored in a hard disk drive (I/O device 24).

While the illustrative implementation provides program instructionsembodying the present invention on the hard disk, those skilled in theart will appreciate that the invention can be embodied in a programproduct utilizing other computer-readable media. The programinstructions may be written using the C++ programming language for anAIX environment. Computer system 10 carries out program instructions forplacement of clock sinks in the design of an integrated circuit using anovel technique wherein the sinks are preferably initially placed andoptimized by conventional methods and thereafter are optimized for clockpower as explained further below. Accordingly, a program embodying theinvention may include conventional aspects of various quadraticoptimizers, cut-based partitioners, buffer insertion tools, etc. andthese details will become apparent to those skilled in the art uponreference to this disclosure. Although these clock sinks are referred toherein as latches, these terms includes devices such as flip-flops,dynamic logic circuits, or any combination of these and other clockedcircuit elements. The integrated circuit designed in accordance with thepresent invention may for example be a random logic module (macro).

The present invention may be further understood with reference to thechart of FIG. 3 which illustrates the logical flow according to oneimplementation of the present invention. The invention begins whencomputer system 10 receives input data in the form of a netlist or othercircuit description with a source and multiple sinks for each clockdomain, as well as the clock domain information (prior to initialplacement the design may be clock traced to assign latches to theirrespective domain groups). In this regard, the term “clock domain”generally refers to any non-data signal that is used to gate one or moresinks. The input data may also include gating source information for thedomain group. A starting layout of the sinks can be provided by an EDAtool, or can simply be a random layout for the circuit elements.

The automated method starts with an initial quadratic placement of thesinks (latches) together with other logic cells of the circuit, in whichthey are recursively partitioned into smaller bins until a thresholdnumber of objects per bin e.g., five, is reached (50). For the initialplacement, all wires (edges) have equal weighting. The timing of thecircuit is then analyzed and adjusted in early optimization, includinggate re-sizing and buffer insertion using, e.g., a 10×10 grid in whichbuffers are assigned to grid cells having lower logic densities (52). Anet-weight driven placement follows which is similar to the initialplacement, but in the net-weight driven placement different weights areapplied to different edges according to their criticality in meetingtiming constraints, and the circuit elements are recursively partitioneduntil there are a smaller number of objects per bin, e.g., two (54).Late optimization performs gate resizing and buffer insertion with,e.g., a 100×100 grid (56).

The results of late optimization provide an input layout of the latchesfor further clock network optimization using clock distributionstructures such as splitters or local clock buffers (LCBs). The clocknetwork construction begins by creating a rectangular grid within theavailable region of the integrated circuit (58). The grid may have thesame horizontal and vertical pitch, or if only one pitch is constrainedthe other pitch may be adjusted. Alternatively both pitches may beconstrained by the designer. Details of the grid creation are discussedfurther below in conjunction with FIG. 4. Once the grid is established,LCBs are assigned to the grid intersection points (60). In the physicaldesign the LCBs are placed directly under (or over) clock pins of theintegrated circuit at these grid intersection points. Arranging theseclocking components with some regularity lowers the capacitance of theclock network and thereby reduces clock power. The latches are thenlocated with regard to the placed LCBs (62). LCB-to-clock assignment andlatch location may be performed using the bipartite matching algorithmdiscussed further below in conjunction with FIG. 5.

Latch-LCB driven placement follows wherein the size of the LCBs istemporarily shrunk to the same width as a latch and a relatively highweighting (attraction) is applied to the interconnections between thelatch and the LCB for this placement step, e.g., by a factor of 10compared to the net weights of the most critical data paths (64). Alllatches are placed next to their corresponding LCB, which is thenreadjusted back to its original size. The final step is detailedplacement which refines the layout using for example min-cut placementor heuristic techniques with a placer such as TIMBERWOLF (66).

Referring now to FIG. 4, a rectangular grid 70 is shown which is used tolocate the LCBs. It is preferable to restrict the region in which toplace the LCBs (i.e., the grid area) to a bounding box which surroundsall of the latches in order to avoid a skewed distribution. Grid 70 hasa vertical pitch v and a horizontal pitch h which define gridintersection points. If there are no constraints on the pitches, theregularity of the grid is maximized by providing equal horizontal andvertical pitches, i.e., v=h, which is indicated by the solid interiorlines of grid 70. The value of the pitches is then calculated by thedimensions of the available region of the integrated circuit and apredetermined number of LCBs to be utilized. Specifically, it can beshown that the pitch h=[h_(α)], i.e., the smallest integer less than orequal to h_(α) according to the formula

$h_{\alpha} = {\frac{{- \left( {W + L} \right)} + \sqrt{\left( {W + L} \right)^{2} + {4{{WL}\left( {l - 1} \right)}}}}{2\left( {l - 1} \right)}.}$where W is the width (horizontal dimension) of the region, L is thelength (vertical dimension) of the region, and l is the number of LCBsto be utilized. After net-weight driven placement (54) the designerknows how many LCBs are necessary based on parameters such as the LCBdrive strength (fan-out constraint) and capacitive loading. The numberof LCBs in a given row is then equal to (W/h−1) and the number of LCBsin a given column is equal to (L/h−1).

Either or both of the horizontal and vertical pitches may be specified.If only one pitch is constrained then the other may be adjusted forfurther optimality, e.g., if h is constrained then v is adjusted. Thehorizontal pitch may be selected based on clock pre-wires. An exemplaryvalue is 29.1μ. The vertical pitch is then chosen so as to make the LCBsas far apart as possible.

With further reference to FIG. 5, the assignment of LCBs to gridintersection points is preferably carried out using clock domainmatching. A grid may have more intersections than LCBs, and LCBs mayalso belong to different clock domain, so the situation should beavoided wherein a particular LCB in one clock domain is assigned to alocation (intersection point) such that all or most of the latches inthat clock domain are very far away from that location. In order to makeLCB assignment latch location aware, it can be modeled as aminimum-cost, maximum-flow problem solved by bipartite matching. Asource 72 fans out to multiple potential LCB locations 74 in the grid.Each LCB (L_(i)) can be part of only one clock domain 76, and the clockdomains 76 have a sink 78. Each link of these logical relationships hasan associated cost and capacity. For the source-to-grid intersectionlinks the cost and capacity are shown as <0:1>. For the gridintersection-to-clock links the cost and capacity are shown as <a:1>where a is the minimum distance d(L_(i), C_(j)) from grid intersectionpoint L_(i) to clock domain C_(j) such that there exits a minimum numberof registers (e.g., 20) in that clock domain within that distance. Forthe clock-sink links the cost and capacity are <0:b> where b is thenumber of LCBs needed in clock domain C_(j). Grid intersection pointsare matched to clock domains by minimizing the total cost whilemaintaining maximum flow (capacity). Finding the maximum-flow,minimum-cost solution to the graph ensures that each grid intersectionpoint (L_(i)) will be used at most once, because the flow on thesource-to-grid intersection is at most one. Since the clockdomain-to-sink flow is at most b_(j), each clock domain C_(j) isassigned b_(j) grid intersection points for LCB placement for themaximum-flow solution. Since the solution is minimum-cost, this clockdomain matching eliminates excessively long connections.

The latch-LCB connections may be similarly solved using bipartitematching, but in this case the circles 74 of FIG. 5 represent latchesr₁, r₂, . . . , r_(n) and the squares 76 represent LCBs l₁, l₂, . . . ,l_(m). The cost and capacity for each source-latch link are still <0:1>.The cost and capacity of the interior link (latch-LCB) is <a:1> where ais now the distance between the i^(th) latch and the j^(th) LCB, denotedby d(r_(i), l_(j)). This distance is set to infinity for any connectionthat has longer than a threshold distance (the minimum distance which ifexceeded would lead to a clock violation). This distance is also set toinfinity for any latch-LCB pair which belongs to different clockdomains. The cost and capacity for the LCB-sink links are <0:b> where bis now the maximum number of latches that an LCB can drive. Thelatch-LCB matching minimizes the sum of the distances over all of thelatch-LCB connections, i.e., minimizes (under the constraint that flowis maximum) the sum:

$\sum\limits_{i = 1}^{n}{\sum\limits_{j = 1}^{m}{{d\left( {r_{i},l_{j}} \right)}{x\left( {i,j} \right)}}}$where x(i,j) is one if there is a connection between r_(i) and l_(j),otherwise it is zero. Manhattan distance or Euclidean distance may beused as the objective to be minimized.

FIGS. 6A-6E illustrate different stages of a design in the placementprocess of the present invention. FIG. 6A shows only the latches for aninput layout 80 a of an integrated circuit that may be derived using EDAtools as discussed above, prior to generation of the clock network.Input layout 80 a includes a plurality of latches which can be used todefine a bounding box 82. FIG. 6B shows a grid with equal vertical andhorizontal pitch applied to layout 80 a, which has been used in FIG. 6Cto locate LCBs 84 in a layout 80 b wherein the latches are in the samelocations, and the LCBs are matched to the latches using the bipartitesolution. FIG. 6D illustrates how the vertical and horizontal pitch maybe independently adjusted or tuned to achieve more uniform spread for alayout 80 c. This pitch adjustment changes some latch-to-LCBassignments. In FIG. 6E a final layout 80 d has shifted the latchescloser to LCBs 84 using latch-LCB driven placement to further minimizeoverall latch-to-LCB connection distance.

In the resulting circuit macro design, since each clock buffer is placedunderneath a macro clock input pin and the latches driven by this clockbuffer are placed closely surrounding it, there is near-zero wirecapacitance between clock buffers and macro clock pins, and clocksignals arrive at each latch at nearly the same time. The regularplacement of clock buffers on pins thus improves clock power whilereducing clock skew, reducing clock latency, and controlling theelectro-migration effect, and the ability to adjust the gridintersection points for placing the buffers imparts further flexibilityto the optimization process. This optimization also includesconsideration of the preferred sink locations and does require that theybe moved long distances. The present invention thereby provides moredesign predictability and greater utilization of the maximum LCB latchdrive strength.

Although the invention has been described with reference to specificembodiments, this description is not meant to be construed in a limitingsense. Various modifications of the disclosed embodiments, as well asalternative embodiments of the invention, will become apparent topersons skilled in the art upon reference to the description of theinvention. In the simplified example of FIG. 6, there are 54 latches andfour LCBs in a single clock domain but those skilled in the art willappreciate that practical applications of the invention may involvehundreds or thousands of clock sinks with larger numbers of latch-LCBclusters and additional clock domains. It is therefore contemplated thatsuch modifications can be made without departing from the spirit orscope of the present invention as defined in the appended claims.

1. A method for designing the layout of a plurality of latches in two ormore clock domains of an integrated circuit carried out by a computersystem, comprising: receiving an input layout of logic cells whichinclude the latches located in a common plane, by executing firstprogram instructions in the computer system; creating a rectangular gridin a region of the integrated circuit containing the latches, the gridhaving grid intersection points defined by a horizontal pitch and avertical pitch, by executing second program instructions in the computersystem; placing clock distribution structures at the grid intersectionpoints in the common plane of the latches, said placing includingbipartite matching of (i) cost and capacity for clock distributionstructures to (ii) corresponding clock domains, by executing thirdprogram instructions in the computer system; and locating the latchesaround the clock distribution structures to minimize an overall distancefor connections between the latches and respective clock distributionstructures, by executing fourth program instructions in the computersystem.
 2. The method of claim 1 wherein said locating includesbipartite matching of (i) cost and capacity for latches to (ii)corresponding clock distribution structures.
 3. The method of claim 1wherein at least one of the horizontal and vertical pitches isindependently adjusted to achieve a more uniform spread of the clockdistribution structures.
 4. The method of claim 1 wherein the grid has aperimeter defined by a bounding box surrounding the latches.
 5. Themethod of claim 1 wherein the grid has the same horizontal pitch andvertical pitch.
 6. The method of claim 5 wherein the pitches are aninteger less than or equal to h_(α) according to the formula$h_{\alpha} = \frac{{- \left( {W + L} \right)} = \sqrt{\left( {W + L} \right)^{2} + {4{{WL}\left( {l - 1} \right)}}}}{2\left( {l - 1} \right)}$where W is the width of the grid, L is the length of the grid, and l isa predetermined number of clock distribution structures to be utilized.7. A computer system comprising: one or more processors which processprogram instructions; a memory device connected to said one or moreprocessors; and program instructions residing in said memory device fordesigning the layout of a plurality of latches in two or more clockdomains of an integrated circuit by receiving an input layout of logiccells which include the latches located in a common plane, creating arectangular grid in a region of the integrated circuit containing thelatches wherein the grid has grid intersection points defined by ahorizontal pitch and a vertical pitch, placing clock distributionstructures at the grid intersection points in the common plane of thelatches, said placing including bipartite matching of (i) cost andcapacity for clock distribution structures to (ii) corresponding clockdomains, and locating the latches around the clock distributionstructures to minimize an overall distance for connections between thelatches and respective clock distribution structures.
 8. The computersystem of claim 7 wherein said locating includes bipartite matching of(i) cost and capacity for latches to (ii) corresponding clockdistribution structures.
 9. The computer system of claim 7 wherein atleast one of the horizontal and vertical pitches is independentlyadjusted to achieve a more uniform spread of the clock distributionstructures.
 10. The computer system of claim 7 wherein the grid has aperimeter defined by a bounding box surrounding the latches.
 11. Thecomputer system of claim 7 wherein the grid has the same horizontalpitch and vertical pitch.
 12. A computer program product comprising: acomputer-readable storage medium; and program instructions residing insaid storage medium for designing the layout of a plurality of latchesin two or more clock domains of an integrated circuit by receiving aninput layout of logic cells which include the latches located in acommon plane, creating a rectangular grid in a region of the integratedcircuit containing the latches wherein the grid has grid intersectionpoints defined by a horizontal pitch and a vertical pitch, placing clockdistribution structures at the grid intersection points in the commonplane of the latches, said placing including bipartite matching of (i)cost and capacity for clock distribution structures to (ii)corresponding clock domains, and locating the latches around the clockdistribution structures to minimize an overall distance for connectionsbetween the latches and respective clock distribution structures. 13.The computer program product of claim 12 wherein said locating includesbipartite matching of (i) cost and capacity for latches to (ii)corresponding clock distribution structures.
 14. The computer programproduct of claim 12 wherein at least one of the horizontal and verticalpitches is independently adjusted to achieve a more uniform spread ofthe clock distribution structures.
 15. The computer program product ofclaim 12 wherein the grid has a perimeter defined by a bounding boxsurrounding the latches.
 16. The computer program product of claim 12wherein the grid has the same horizontal pitch and vertical pitch. 17.The computer program product of claim 16 wherein the pitches are aninteger less than or equal to h_(α) according to the formula$h_{\alpha} = \frac{{- \left( {W + L} \right)} = \sqrt{\left( {W + L} \right)^{2} + {4{{WL}\left( {l - 1} \right)}}}}{2\left( {l - 1} \right)}$where W is the width of the grid, L is the length of the grid, and l isa predetermined number of clock distribution structures to be utilized.